Skip to content

Remove stash slab allocator from the engine#4394

Draft
d-torrance wants to merge 6 commits into
Macaulay2:developmentfrom
d-torrance:remove-stash
Draft

Remove stash slab allocator from the engine#4394
d-torrance wants to merge 6 commits into
Macaulay2:developmentfrom
d-torrance:remove-stash

Conversation

@d-torrance

Copy link
Copy Markdown
Member

I had Claude do a little analysis of how the engine is using the code in the system directory, and it noticed that we're not using a bunch of code! The n_per_slab = 0 line that it mentions below was added in 2011...

Draft for now pending all the big engine PR's.

AI Disclosure

Pretty much 100% Claude. 🤖

It wrote the following PR message...

Summary

Remove the stash/slab/doubling_stash object pool allocator from M2/Macaulay2/e/ and replace all call sites with direct GC allocation (newarray_clear/freemem). Also remove several pieces of dead code in the engine that referenced the system/ concurrency directory.

Net change: −552 lines across 39 files, with mem.hpp and mem.cpp deleted entirely.

Background

stash was a fixed-size object pool designed to amortize allocation overhead for hot paths in Gröbner basis and resolution computation. The design allocated memory in large slabs, chopped them into equal-sized pieces, and kept a free list for reuse.

However, the slab allocator had been silently disabled: n_per_slab was hardcoded to 0 in the stash constructor, immediately overriding the calculated value. With n_per_slab == 0:

  • new_elem() reduced to return newarray_clear(char, element_size) — a plain GC allocation.
  • delete_elem() reduced to freemem(p) — a plain GC free.
  • The slab list, free list, and statistics fields were never touched.
  • The spinlock protecting the free list was initialised and (in the destructor) acquired, but never actually guarded anything — and the destructor never released it.
  • The doubling_stash global (doubles) was initialised in engine.cpp but doubles->new_elem() / doubles->delete_elem() were never called anywhere.

The stash class was therefore just a layer of indirection on top of newarray_clear/freemem with no effect on behaviour.

Changes

Dead code removed

  • NCAlgebras/NCF4.cpp: removed #include of supervisorinterface.h, which was only needed for a commented-out getAllowableThreads() debug line.
  • schreyer-resolution/res-f4.cpp: removed a #if 0 block containing experimental thread-task test code (testFcn1, testFcn2, testTasks) and its disabled call site in the F4Res constructor. The file already uses mtbb::parallel_for for its actual parallelism.

Stash removal

  • mem.hpp / mem.cpp: removed the slab, stash, and doubling_stash class definitions and their implementations. The files contained nothing else of use and have been deleted.
  • GB computation (gbring, gb-default, gb-toric, montable): stash->new_elem()newarray_clear(T, 1) or newarray_clear(char, size) for variable-size structs; stash->delete_elem(p)freemem(p).
  • Monomial ideal / Hilbert function (monideal, hilb, schreyer-resolution/res-f4-monlookup): same substitution, plus removal of the shared-stash parameter that threaded a stash * through constructors of MonomialIdeal, monideal_pair, partition_table, and ResF4MonomialLookupTableT. The sharing was a no-op since the pool was already just calling newarray_clear/freemem. Also cleaned up the count-field ownership bit in MonomialIdeal and ResF4MonomialLookupTableT that tracked whether an instance owned its stash.
  • Resolution code (res-a0, res-a1, res-a2 and their *-poly companions): same substitution. Note that res2_pair_stash in res-a0 was created but new_elem/delete_elem were never called on it.
  • Remaining references (engine.cpp, debug, finalize, f4/f4-computation, f4/f4-spairs, interface/monomial-ideal): removed stash::stats() calls, the dstash() debug function, the doubles = new doubling_stash initialisation, and stale #includes.
  • Build system (e/Makefile.files.in, d/Makefile.files.in, system/CMakeLists.txt): removed references to deleted files and updated a stale comment.

Possible future work

If allocation performance in the GB hot paths ever proves to be a bottleneck, the GC is not necessarily the right long-term allocator for these objects. The GC has to scan all live heap memory for pointers on every collection cycle, and its thread-safe allocation path carries overhead compared to a purpose-built pool.

However, replacing GC allocation here is not a straightforward drop-in. Many of the structs involved — gbvector, spair, Nmi_node, res_pair, and their relatives — contain ring_elem fields whose values are themselves GC-managed. Moving the containing struct off the GC heap would require either registering it as a GC root or restructuring the data so that coefficients are kept alive through some other means. Objects that are purely integer data (exponent vectors, raw monomials) have no such constraint and would be the easiest to migrate.

Two approaches are worth considering:

Arena/bump allocation per computation step. The existing f4/memblock.hpp and schreyer-resolution/res-memblock.hpp already implement this pattern: a bump allocator that is reset at the end of each degree step, with zero per-object free overhead. This is well-suited to objects whose lifetime is bounded by a single step of the GB algorithm. Extending this pattern to gbvector, spair, and similar types would be a natural next step, provided the coefficient lifetime issue is handled (e.g., by keeping a separate GC-visible list of live coefficients, or by moving to non-GC coefficient storage).

TBB scalable allocator. tbb::scalable_allocator (already a project dependency via m2tbb.hpp) uses thread-local caches and avoids the GC's collection overhead entirely. It is the better choice for objects with irregular lifetimes that do not fit a simple arena model, and would likely outperform the GC for the free-list-heavy patterns in monomial ideal tree nodes and resolution pair lists. The same caveat about GC-managed fields applies.

Either path requires profiling to confirm that allocation is actually the bottleneck before undertaking the work.

Testing

This PR makes no intentional behaviour change — every allocation that previously went through stash already went directly to the GC allocator. The standard test suite should be sufficient to confirm no regressions.

@d-torrance d-torrance added the Engine Macaulay2/e label Jun 1, 2026
d-torrance and others added 6 commits June 6, 2026 14:53
NCAlgebras/NCF4.cpp: remove #include of supervisorinterface.h, which was
only needed for a commented-out getAllowableThreads() debug call.

schreyer-resolution/res-f4.cpp: remove #if 0 block containing experimental
thread task test code (testFcn1, testFcn2, testTasks) and its disabled call
site in the F4Res constructor. This file already uses mtbb::parallel_for for
its actual parallelism; the supervisor-based test code was never enabled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The stash/slab allocator in mem.hpp was a fixed-size object pool intended
to amortize malloc overhead. However, n_per_slab was hardcoded to 0 in the
constructor, disabling the slab machinery entirely: new_elem() called
newarray_clear() directly, and delete_elem() called freemem() directly.
The spinlock, free list, and slab linked list were all dead code. The
doubling_stash global was initialized but never used (no call sites for
doubles->new_elem/delete_elem).

Remove stash, slab, and doubling_stash entirely from mem.hpp/mem.cpp.
Replace all call sites in GB and monomial table code with direct calls to
newarray_clear() and freemem(), which is what the stash was already doing.

- mem.hpp/mem.cpp: remove slab, stash, doubling_stash; keep engine_alloc/dealloc
- gbring: new_raw_term() uses newarray_clear(char, gbvector_size); gbvector_remove uses freemem()
- gb-default: spair, gbelem, and lcm exponent arrays use newarray_clear/freemem directly
- gb-toric: monomials use newarray_clear(int, nslots)/freemem directly
- montable: mon_term nodes use newarray_clear/freemem directly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MonomialIdeal, partition_table (hilb), and ResF4MonomialLookupTableT all
used stash to allocate fixed-size tree nodes (Nmi_node / mi_node). They
also passed a shared stash pointer through constructors so that related
objects could draw from the same pool — but since the pool was already
just a wrapper around newarray_clear/freemem, the sharing had no effect
other than adding complexity.

Replace all stash usage with direct newarray_clear/freemem calls and
remove the stash parameter from all affected constructors and functions:

- MonomialIdeal: remove stash *mi_stash member; remove stash *mi_stash0
  parameters from all three constructors; clean up the count field, which
  encoded stash ownership in bit 0 (count = 2*size + owns_stash_bit) —
  now count is always even, so the debug_check condition count<=1 becomes
  count==0.
- monideal_pair: remove stash-passing constructor overload (both overloads
  now do the same thing).
- hilb_comp: remove stash *mi_stash member; remove from partition_table
  constructor and the iquotient_and_sum static function.
- ResF4MonomialLookupTableT: same pattern — remove stash *mi_stash member
  and parameter; clean up the count-based ownership bit in the destructor.
- minimalize_res_varpower_monomials: remove stash * parameter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The classical resolution computations (res-a0, res-a1, res-a2) each created
their own stash objects for two purposes: allocating res_pair / res2_pair
nodes, and allocating Nmi_node objects that were then shared with
MonomialIdeal instances via the mi_stash parameter (now removed from
MonomialIdeal constructors in the previous commit).

- res-a0: remove stash *res2_pair_stash (was created but new_elem/delete_elem
  were never called — entirely dead) and stash *mi_stash; MonomialIdeal
  construction no longer takes a stash argument.
- res-a1: remove stash *res_pair_stash and stash *mi_stash; replace
  res_pair allocation with newarray_clear(res_pair, 1) and freemem().
- res-a0-poly / res-a1-poly: remove stash *resterm_stash; replace resterm /
  res2term allocation with newarray_clear(char, element_size) since these
  are variable-size structs (size includes inline monomial storage).
- res-a2 / res-a2-gb: gbres_comp owned the shared stash *mi_stash passed to
  all gb2_comp nodes; remove it from gbres_comp, gb2_comp constructor, and
  the gb2_comp::setup() helper. MonomialIdeal and monideal_pair construction
  no longer takes a stash argument.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clean up the last sites that referenced the now-deleted stash class:

- engine.cpp: remove doubles = new doubling_stash initialization and the
  #include of mem.hpp (doubling_stash was initialized but never used).
- debug.cpp/hpp: remove dstash() function, which printed stash statistics
  (stash::stats()). The call site in interface/monomial-ideal.cpp is also
  removed (it was guarded by M2_gbTrace >= 1 but the output was always
  empty since statistics were never updated).
- finalize.cpp: remove stash::stats(o) call from engineMemory(); the
  function still exists but now returns an empty string, which is correct
  since the stash no longer accumulates statistics.
- f4/f4-computation.cpp: remove stash::stats(o) from F4Computation::show()
  and its now-unneeded #include of mem.hpp.
- f4/f4-spairs.cpp: remove #include of mem.hpp and an outdated comment
  that described the old stash-based memory strategy.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After the stash removal, mem.hpp contained only engine_alloc(),
engine_dealloc(), and the engine_allocated/engine_highwater counters —
none of which were called anywhere in the engine. mem.cpp only defined
the two counter variables. The files served no purpose.

Remove mem.hpp and mem.cpp entirely, and clean up all stale references:

- gbring.cpp, montable.hpp: remove #include "mem.hpp" (included only for stash)
- e/Makefile.files.in: remove 'mem' from the source list
- d/Makefile.files.in: remove mem.hpp from dependency lists for
  interface.o and engine.o; also remove the mutex.h dependency that was
  pulled in transitively through mem.hpp
- system/CMakeLists.txt: update comment on mutex.h — it is used by
  pthread0.d, not mem.hpp

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
monomial binomial_ring::new_monomial() const
{
return (monomial)((const binomial_ring *)this)->monstash->new_elem();
return newarray_clear(int, nslots);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays of ints should really use newarray_atomic_clear. the atomic variants should be used when there are no pointers in the array

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, since it's ifdef'd out anyways, but these aren't really related to the main change

Nmi_node *MonomialIdeal::new_internal_mi_node(int v, int e, Nmi_node *d)
{
Nmi_node *p = reinterpret_cast<Nmi_node *>(mi_stash->new_elem());
Nmi_node *p = newarray_clear(Nmi_node, 1);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use newitem_clear instead? The stash code only used newarray_clear since it only knows the size. Similarly for the other uses with newarray_clear(...,1)

{
void *p = mem->new_elem();
return (reinterpret_cast<gbvector *>(p));
return reinterpret_cast<gbvector *>(newarray_clear(char, gbvector_size));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could in theory use the GETMEM macro from e/newdelete.hpp for all of the ones with variable size. It calls the same code at the end of the day though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Engine Macaulay2/e

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants